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MDLTI.MICROPHONE ADAFTIVE NOISE REDUCTION TECBMQUES 
^FOR SPEECH ENHANCEMENT 



I. Background c 

^speech «»»^catioa ;q,pfication8. such as teleconfen^icing. hands-ftee telephony and h^g aids, 
presence ofbackgroundnou«a„d/orr«^^ 

^d speech signal. stems fiom the large distance between Hence. 
Sirlr? T " ^"lti---PJ^one systems exploit spah^ infonna! 

r ^ ^"^'^^ Ihe desired signal and noise sig^ and are thus 

^erred to smgle mrccophone procedures (such as spectral subtxacdon). Because of aesthetical reasons, 
mulh-mxcrophone techniques for e.g.. hearing aid applications go togpfl^er with the use of smaD-sized ar^ 

^2 to errors m the assumed signal model such as microphone mismatch, reverberation. ... [1. 2] Jn 
hoards, macrpphm^s are rarely matched in gain and phase, m [3], e.g.. gain and phase diffe^nces 
between mrcrophone characteristics of up to 6 dB and 10", respectively, have been rcportel 

" ^'"^ consists of a fixed, spatial pre-processo, which 

ceUer (ANQ [12]. The ANC mmmnzes the ou^ut noise power whfle the bloddng matrix should avoid 

^"t^^ ^^^'^'^SCassumes the desir^dspeaker location, themi. 

crophoned«ractost„» andposrtimxs tobelm^ andreflections of the speech signal tobe abseiZ If these 
««»«^onsarefe^ed.itpravides.hundis^ 

Howev^ m reahly these assumptions are often violated, resulting m so^ spec* leakage and hence 
^ chstortion. To limit speech distortion, the ANC is adapted during periods of noise ^y [7 10^ 

"^Id ; I ' '^"'^^•^^---^•^P^f"^ ofsmaU errors intheassumed 

^model. suchj. microphone mismatch [16. 17]. A widely applied method consists of imposmg a 
Q«afat„:^quahty Constraint to the ANC(QIC^GSQ[l^^ 14. 15.18, 19]. ForLMSupdatmgTe S«Li 

me i^ic-osc goes at the ejq)ense of less n«Mse reduction [17]. 

mm. ^Multichannel ^«-ri^/^«^^(MWF) technique has beenproposedtfaatprovides a 

P1>P4] in coutrast to the ANC of the GSC. the MWF is able to take speech distortion into accord 

!r?~br"°" ^r'^^^^^^^-^^^^f^-MWr can also begeneraliz^ 
for a tade-off between speech distortion and noise reduction. We wiU r«fer to this generalization «. Speech 
^n2,g^MWF(SDW-MWP).TheMWtechni^^ 
^testaustrcsrfd^recordedspeech signal 

■^Inc<»tn«totheeSC.theMWFdoesnotmakeanyapriori»^^^ 
thatnooraless severe robustness constramtisneededtoguar^ 

wrth smaU-s^ed arrays [16. 17]. Especially in complicated noise scenarios su^ as muhiplc r^J^ 
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ot diffi^e nffls.. 0» MOT o»4«*nDa lie OSC. wlra. to OSC is snppk^^ 

constraint [17]. 

la t20. 21], the implementatioa of the MWF is based on a Generalized Singular Value Decomposition 

(GSVD) of an mpot data matrix and a noise data matrix. A cheaper alternative based on a QR Decompo- 

sibon (QRD) has been proposed in [22]. A subband implementation [23] results in improved mteUigftiKty 

^ a significantly lower cost compared to the fidlband approach. However, in contrast to the GSC and the 

QIC-GSC [14], no efficient, cheap stochastic gradient based implementation of the (SDW.)MWF which 

am^ U»e use of expensive matrix computations, is available yet m [25], an LMS based algorithm'&r ttie 

MWF has been developed. Tbs algorithm needs recordings of calibiation signals. Shice room acoustics, 

microphone characteristics and the location of the desired speaker change over time. Sequent re-cahT«ation 

IS reqmred, makmg this approach cumbersome and expensive. In [26]. an LMS based SDW-MWF has 

been proposed that avoids the need for caKbration signals. The algorithm however re^ 

dence assumptions Aatarenotnecessarily satisfied, resulting in degn«^ malrix-based 
uiq>Iemenlations. . : 

n. Summary 

M the present imenOon, we establish a generajized muW^hamiel noise reduction scheme, referred to 
as Spatiany Pre-processed Speech Distortion Weighted Multi^annel Wiener Filter (SP^W-MWF) that 
encompasses the GSC and the MWF as extreme cases. In addition, the scheme allows for in^etUen 
solunons such as the Speech Distortion Regularized GSC (SDR-GSC). The generalized scheme, depicted 
m Figure 3. consists of a fixed, spatial pre-processor and an adaptive stage that is based on an SDW-MWF 
i^^^li^r^SpatUafyPr^processedSpeechDist^ Weighted Multi-channel Wiener filter (SP^DW. 

The SP-SDW-MWF adds robustness against signal model errors to the GSC by taking speech distortion 
exphcmy mto account m the design criterion of the adaptive stage. The SP-SDW-MWF is an alternative 
tectauque to tiie widely studied QIC-GSC to decpase the sensitivity of the GSC to signal model errors 
such as microphone mismatch, reveiberation. ... A parameter n is incorporated in the SP-SDW-MWF tiiat 
aUowsfor a trade-off between speech distortion and noise reduction. Focussing aU attention towards speech 
dBtrntem (i.e, setting ^ = 0) results in the output of die fixed beaiformer. In noise scenarios with very 
low S,gnal-to-Noise Ratio (SNR). e.g.. -10 dB. a fixed beamformer may be preferred. Ad^tivity can ^ 
be easily reduced or excluded in the SP-SDW-MWF by decreasing flie parameter ^ to 0. Compared to Uie 
widely studied QIC-GSC. die SP-SDW-MWF achieves a better noise reduction performance for a given 
maxmmm allowable speech distortion level 

In [22. 27] recursive implementations of the (SDW-)MWF have been proposed based on a GSVD or QR 
decomposition. A subband implementation [28] results m improved inteUigibiUty at a significantly lower 
c^tompar^ to tiie fiollband approach. These techniques can be extended to implement die SP-SDW- 
MWF [29]. However, in contrast to the GSC and die QIC-GSC [14], no cheap stochastic gradient based 
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^lementation of the SP-SDW-MWF is available. In the present invention, we propose time^main and 
fiequency-domain stochastic gcadient implementations of the SP-SDW-MWF that preserve the benefit of 
Ihe matrix-based SP-SDW-MWF over QIC-GSC. 

Below, the different anbodiments of the pissent invention are described. 
J^f^^^^'nbodmentpropos^aSpeechDistortionReguhrizedGSCiSDK^^^ 
|s developed for the adaptive stage of the GSC: the ANC design criterion is supplemented with a regular- 
«at.oa term that limits speech distortion due to signal model enors. In the SDR-GSC, a parameter u is 
mcoiporated that allows for a Hade-off between speech distortion andnoise reduction. Focussing aU atten- 
ton to noise lectoction. results in the standardGSC, while, on the other hand. fiKHsarfng 
"T^JT? ^ "^"^"^ ^^'^ beamformer. In noise scenarios with low SNR, adapthdty 
m Ifae SDR-GSC can be easily reduced or excluded by increasing attention towards speech distortion, i e by 
decreasmg flie parameter |, to 0. 11» SDR-GSC is an alten«tfive technique to the QIC-GSC to decrease'the 
TA^ errors such as microphone mismatch, rev^beratioo. .... m contrast to 

the QIC-GSC. the SDR-GSC shifts emphasis towards speech distortion when the amount of speech leakage 
grows, to the absence of signal model errors, the perfomiance of the GSC is preserved. As a result, a better 
no»e reduction perfermance is obtained for smaU model errors. whUe guaranteeing robustness against large 
model errors. * 

In a second embodiment, we further improve die noise reduction performance of Ihe SDR-GSC by 
adAng an extra ad^tive filtering operation wo on the speech ieference signal. We refer to this general- 
^^^^^^^f^^^P^^^ch Distortion freighted Multi-channel Mener Filter (^F-SDW- 

MWIO.TlxeSP-SDW-MWFisdcpictcdinFigure3andencompassestheMWF[20]asaspecialcl.A«^ 
aparameter ^ « mcorporated in the design criterion to aUow for a trade-off between speech distortion^ 
nou« reduction. Focussing all attention to speech distortion, results in the ou^ut of the fixed beamformer 
Also te^. ad^ty can be easily r«duced or excluded by decreasing M to 0. It is shown that -in the a^ 
T ^•^^^ ^ SP-SDW-MWF corresponds to a cascade 

snp ri^T^ * ""^'^ "^'""^ ^'^'^ (SDW-SWF) [30] and thus outperforms the 

SDR.GSC.Ihthe]^ce of speech leakage, the SP-SDW-MWF widx wo tries to preserve its performance: 
compared to a SDR-GSC (with SDW-SWF poster), the SP-SDW-MWF then contains extra filtering op- ' 
eranons Oiat compensate for the performance degradation of the SDR-GSC (with SDW-SWF) due to speSi 

*° '""^-^'^ <^ performance dotnot 

degmde due to microphone mismatch. In [22, 27] recursive implementations of the (SDW-)MWF have been 
proposed based on a GSVD or QR decomposition. A subband implementation [28] results m improved 
mtemg.bd.ty at a significantly lower cost conq«red to the fidlband approacL THese techniques can be 
extended to miplement the SDR-GSC and, more generally, the SP-SDW-MWF. 

la a third enAodtmenU we propose cheap time^main and frequency-domain stochasHc gradient im- 
Plementauons of the SDR-GSC and SP-SDW-MWF. Starting ftom the design criterion of the SDR-GSC 
or more generally, the SP-SDW-MWF. we derive a timcKiomam stochastic gradient algorithm, to addition.' 



-t- 

we modify the LMS based algorithm [26] so tiiat it applies to the SP-SDW-MWR To tnctease conveigence 
and ledace complexity, a fiequency-domain implementation has been proposed. Both, the stochastic gta- 
dient and LMS based algorithm suffer firom a laige excess eiror when appKed m highly time-varying noise 
scenarios. We show diat excess error in the stochastic gradient algorithm is reduced by applying a low 
pass filter to the part of the gradient estimate that limits speech distution. .He low pass filtering avoids 
a highly time-varying distortion of the desired speech component while not degrading the tracking perfor- 
mance needed in time-vaiying noise scenarios. The stochastic gradient SP-SDW-MWF outperforms fte 
LMS based algorithm, while compleodty is not increased. Experiment^ results show fliat the low pass filter- 
ing significantly improves the performance of the stochastic gradient algorithm and does not compromise the 
tracking of changes m the noise scenario. In addition, experiments demonsttate that the proposed stochastic 
gradient algoriflim preserves the benefit of the SP-SDW-MWF over QIC-GSC. The limited computational 
cost and the better noise reduction petfinmance of the proposed algorithm make it a good altematrve to the 
SPA [14] for inq>lementation in hearing aids. 
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Brief Descriptioii of the Drawings 

A number of embodiments of the present invention, together with some aspects of 
the prior art wiU now be described with reference to the drawings, in which: 
5 Hg. 1 depicts the concept of a Generalized Sidelobe CanceUer; 

Fig. 2 depicts an equivalent approach of multi-chamiel Wiener filtering; 
Fig. 3 depicts a Spatially Pre-processed SDW MWF; 

Fig. 4 depicts the decomposition of SP-SDW-MWF with wo in a multi-channel filter 
Wd and single-channel postfilter ei - wo; 
10 Fig. 5 shows the influence of l/p. on the peifomiance of the SDR GSC for different 

gain mismatches Yz at the second microphone; 

Fig. 6 shows the influence of 1/^ on the perfonnance of the SP SDW MWF with Wo 
fiw different gain mismatches Y2 at the second microphone; 

Fig. 7 shows tiie ASNIW,„g and SDi„,e„,g for QIC-GSC as a function of ^ for 
15 different gain mismatches Y2 at the second microphone; 

Fig. 8 depicts the complexity of TD and FD Stochastic Giadient (SG) algorifimi wifli 
LP filtering as a fimction of filter length L per channel; M = 3 (for con^iarison. the 
complexity of the standard NLMS ANC and SPA are depicted too); 

Fig. 9 depicts the performance of different FD Stochastic Gradient (ED-SG) 
20 algorithms; (a) Stationary speechlike noise at 90°; (b) Multi-talker babble noise at 90°; 

Fig. 10 depicts the influence of LP filter on performance of FD stochastic gradient 
SP-SDW-MWF (1/^ = 0.5) without wo and with wo. Babble noise at 90°; . 

Fig. 11 depicts the convergence behavior of FD-SG for X = 0 and X = 0.9998. The 
noise source position suddenly changes from 90° to'l80° and vice versa; 
w Fig. 12 depicts the performance of FD stochastic gradient implementation of SP- 

SDW-MWF with LP (X= 0.9998) in a multiple noise source scenario; and ' 

Fig. 13 depicts the performance of ED SPA in a multiple noise soun:e scenario. 

Detailed Description 
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Before the invention is described in detail, the prior art GSC [4] and the QIC-GSC 
[14. 19] will be reviewed under section 1. Under section 2, the Multi-chamiel Wiener 
Filter (MWF) technique will be discussed [20]. 



I 



1 Generalized Sidelobe Canceller (GSQ 
1.1 Concept 



Figpre 1 describes fte c«awept of the CSeneralized Sidelobe CanceUer (GSC) [4], which consists of a fixed, 
spahal pre-processor. Le., a fixed beamWr A(z) and a blocking matrix B(z). and aa ANC. CHven M 
suoophooe signals v«,w. 

^[*] = < W + » = 1, .... M (1) 
^th ut[k] the desired speech contribution and t,?[fc] the noise contribution, the Wbeamfonner AU) 
(e.g.,delay-and-sum) creates a so-called speech reference 

»l)W = 9Slfc]+y5(fel, (2) 
1^ ' 'IT- Of the desired signal mth a speech contribution »g [k] and a noise 

front at 0«. HreblodongmatnxBC^) createsM- Iso^aHed noise references 

M[fcJ = vilk] + [&], i « 1, j,^ _ 1 ■ 

by st^zeioes towards the ftont so that the noise contributions are dominant compared to the 
speech leakage contributions y?(fc). M the sequel, the superscripts » and „ are used to refer to the speech 
and noise contabutaon of a signal. During periods of speech + noise, flie references ^k], i = 0. .... M - 1 

~ntam speech + noise. During periods of noise only. y,[k], i = 0 M - 1 only consist of a noise 

component. ,.e.. j«M = The second order statistics of the noise signal are assumed to be quite 

stationaiysuchthattheycanbeestimatedduringperiodsofnoiseonly. - 

To design the fixed, spatial pre-processor. assumptions aie made about the micioiihnne characteristics 
fte speaker posihon and the microphone positions and fi^ermore reverberation is assumed to be absent: 
If ftese assumptions are satisfied, the noise references do not contain any speech, i.e.. yt[k] ^ 0 for 
^ - 1. ... M-1 However in praptice,. the assumptions are often violated (e.g. due to microphone 
mismatch andreveri^ration) so tiutspeechleaks into thenoisereferem^ 
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leakage, the ANC wum-^x^ 



where 



Wi=[u,i[0] wi[l] ... t«,[L-i]]^, 



(4) 



(5) 



faadiq.ted during periods of noise only [7.13]. Hence. IheANCwiof-xn^ 
te,, 



and equals 



where 



= ■g{yr.j>f-iy£j^-i}-^g{yy,^.iy?-*[ft - A]}. | 



«9 
(7) 

<8) 
(9) 

andwliereAisadelayappHedtolbespeediieferencetoaUowfornon^ausaltapsmtl,em ^ The 

deby A is nsually set to r#1. wheie fx] letams tbe smaUest integer equal or te^ 
1 : M - 1 m wi:M-i and yi^^-i refers to the subsciq>t8 of the first and last channel component of Hie 
adaptive filter and input vector, respectively. 

Under ideal conditions (»? [fc] = 0, i = 1 , , . , . M - 1), die QSC mininrizes the residual noise while 
not distorting Ihe desired speech signal. Le.. z^lk] = ySlk- AJ. However, when used m combination with 
small-sized arrays, a small error in the assumed stgnal model (hence »f [*] ^ 0. i = 1, .... M - 1) aheaify 
suffices to produce a significantly distorted oulput speedk sigiM ««[fcj 

«•[*:]■= J/S{fc- A] -wfa,_iyf,itf_,[fcl. (10) 

even when only adapting during noise-only periods, so a robustness constraint on ^^.^-i is required [17] 
In addition, the fixed beamfonner A(zr) should be designed so that the distortion in the speech reference 
l^olk] IS mnmnal for all possible model errors. In the sequel, a delay-and-sum beamfoimer is used. For 
^-sized arrays, this beamfomier offers sufficient robustness against signal model errors, as it minimizes 
the white noise gain or ndse sensitivity 2. Given statistical knowledge about the signal model errors that 
occur m practice, fiirther optimized beamfo rmm can be designed. e.g.. usmg the tedmiques m [31]. 

■in a Un«^on,ain ^^^ou Oie mput signals of the adaptive filter ^u«-, imd the filter Wxa,-i a« red. Henee^ 
^T^^on. "^"^ ^^"^^ """"Pi- input si8«ls so a«t th^ «u. aLo be .p^lSTa 



iJZ Quadratic Inequality Constraint (QIC-GSC) 



A common approach to increase the lobusiness of the GSC is to apply a Quadratic InequaUty Constraint 
(QIC) [9]-[14, 19] to the ANC filters so that the optimization criterion (6) of the GSC is modified 

into 



subject to < 0^. 



(") 



The QIC avoids excessive growth of the filter coefficients w. Hence, it reduces the undesired speech distor- 
tion when speech leaks mto the noise references, hi [14, 19], it is shown that -for a GSC with a blocking 
matrix B (/) that satisfies B»(/)B(/) = I- the QIC on the ANC filters corresponds to a constraint on the 

noise sensitivity. 

In [14], the QIC-GSC is implemented by using flie adaptive scaled projection algorithm: at each update 
ste p, the qua dratic constrahit is applied to the newly obtamed ANC filter by scaliig the filter coefficients 
^ irfmr '^iM-i^i.M-x exceeds ^. Although this technique works well fiir LMS updating it 
does not appear to be as effective for RLS as for LMS [19]. Recently. Tian et al implemented the quadratic 
constramt by using ^;ariabl^ loading [19]. For RLS. this technique provides a better approxhnation to the 
optunal solution (11) than the scaled projection algorithm. For LMS. variable loading does not appear to 
ofBa- any performance advantage over the cheapei; scaled projection LMS. 

2 Multi-chanuel Wiener filtering (MWF) 
2.1 Concept 

Recently, a Multi-channel Wiener filtermg (MWF) technique has been proposed that provides a Mhummn 
Mean Square Eixor (MMSE) estimate of the desired signal portion m one of the received microphone signah. 
[2 1, 22. 23, 24]. In contrast to the GSC, this filtering technique does not make any a priori assmnptions about 

the signal modd and is found to be more robust [16. 17.21]. EspedallymcompUcated noise scenarios such 
as multiple noise sources or difltoe noise. a» MWF outperforms the GSC, even when the GSC is suppUed 
with a robustness constraint [17]. 

"Die MWF wi:M 6 C^^xi minimizes the Mean Square Error (MSB) between a delayed version of the 
(unknown) speech signal «f (fe - A] at the i-th (e.g.. first) microphone and the sum ^^^Ui-Mffcl of the M 
filtered, received microphone signals: 



wtuif =:aigmtog{|<(fe - A] - ^MUi:M [fc] r} , 



(12) 



leading to: 



with 



■^i.M = £i^i:Mlk]nfM[k]}~^S{viiM[khV*[k - A]}. 



^l,M[k] = [ui{fc] U2[fel ... UAflfc]]"^, 

UiW = [ui[k] Ui[k-1] ... «,[fe_£ + l]]^ 



(13) 

(14) 
(15) 
(16) 



An equivalent approach consists in estimatii^ a delayed version of the (unknown) noise signal tt?[fc- A] 
in the t-th microphone, resultiiig in 



and 



where 



Wi:A^ =g{ui:Jfaf[A;]ugj^[fe]}"^g{ui:j^[feK^^ - A]}, 



(17) 



(18) 



(19) 



The estimate of the speech component ut[k - A] is then obtained by subtracting the estimate - A] = 
wffAf ui:Ar{A;] from the delayed, i-th microphone signal Ui[k- A], Le. 

^[k - A] = ui[k - A] ^ wSi^UL-M [As]. (20) 
This is depicted in Figure 2 for - A) = [fe - A]. Using (13) and (18), it can be easily shown that 

^iM + wi:M = e(i-i)i;+A, (21) 
wilfa ei the l-th canonical vector, defined as 



= [ ^ 0 ^ 0 ... 0 1^ 

L poshioa I J 



(22) 



This shows that Ihfi two approaches indeed lead to exacdy tfie same speech signal estimate. Aprocedure for 
cominiting wim or will be given in Section 2.3. 



2.2 IVade-off speech distortion versus noise feduction (SDW-MWF) 

The residual ercor enexgy equals 

aud can be decomposed as 

^ ' ^ s ' 



(23) 



(24) 



where 4 equals the speech distention eneigy and 4 the residual noise enetgy. Tlie design criterion of 
the MWF can be generalized to aUow for a traded between speech distortion and noise reduction, by 
incorporating a weij^hting &ctor ft [20] with ft 6 [0, oo] 



'^ruu = arg min S{\^[k - A] - ^M<M[kf} + ;^{|wi«^uSm[A1|'}. 



The sdution of (13) is given by 



^liM eMM[k]nl^[k] + mlMkK.M[k]}-^£{ni,M[k]ur[k - A]}, 



(25) 



(26) 



which corresponds to the Wiener formula with an adjustable input noise level. Note diat (18) is obtained 
with M 1 and that (21) still appUes. The filter (26) conesponds to the time-domain constrained estimator 
proposed in [32], whidi optimizes tiie following critoion: 



mine^ subject to 4 < a£{uJ5|<4f} 

where 0<a<landAtisthe Lagrange-multiplier; 

Equivalenfly, tiie optimization criterion for tv in (13) can be modified into 



(27) 



wi,^ = "g °^ S{\^MniM[k]f}+,iS{\v^[k - A] - w&^tt?,M[fcl|'}. I (28) 



resulting in 



wi:i>f==g{n;,^[fc]u^g[fe] + iuf,j^[fc]u;;S[Ai)}-i5{uy,j„[JbK"-[& - A)}. 



(29) 



In tiie sequel, we wiU refer to (29) as tiie Speech Distortion Weighted Multi-channel Wiener Filter (SDW- 
MWF). 

ThefectOT/i e (0,ool trades off speedi distortion versus noise reduction. If m = 1. tiie MMSE criterion 



-//- 



(12) or (17) is obtained. If/i > 1. the residual noise levd will be reduced at the eotp<ase of increased 
distortion. By setting m to oo, all emphasis is put on noise reduction and speech distortion is completely 
ignored. This results in w = 0 or w = ep.i)^^^, which means that the output signal equals 0. Setting /* 
to 0 «n the other hand, results m ^ = en-i)L+A or w = 0 and hence in no noise reduction. 

23 Implem^tation of MWF 

In practice, the correlation matrix S{ui,^[k]u^«[k]} is unknown. During periods of speech, the inputs 
tii[k\ consist of speech + noise, i.e., Ui[k] = «?[fe] + i = 1, .... M. During periods of noise, 
only fhe noise component ii?[fe] is observed. Assmning that the speech and noise signal are unconelated, 
^{«f:A/W<^[*l} can be estimated as 

£{<M[k]ulj^[k]} = Siai:Mlk]u»u[k]} - £inlM[k]u^[k]}, (30) 

where the second order statistics e{ui.,M[k]n^.Mm} are estimated daring speech + noise and the statistics 
^i^iMlkKlMlk]} during periods ofnoiseonly. like for the GSC, a robust speech detection is thus needed. 
Usnig (30), C29) and (26) can be re-written as: 



•^IM = (^gWitf Wuf^W} + (1- ^)g{<^tfc]u?^[fe]}) ~' S{ul.^[k]ur[k - A]} 



(31) 



and 



•»i'.M = {H^l■.Mlk]^l?,M[k]} + (f^-l)SiulM[k]u^,^[k]}^ 
X (£{fXi:Mlk]vt[k - A]} - £{a^Mik]v7-*[k - A]}) . 
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(32) 



In [21], the Wiener filter is computed at each time instant k by means of a Generalized Smgular Value 
Decomposition (GSVD) of an speech + noise and noise data matrix. A cheaper recursive alternative based on 
a QR-decomposition has been proposed in [22]. In [23. 24], a subband hnplemeotation has been developed 
to mciease mtelligibility and reduce complexity, making it suitable fat hearing aid applications 

FinaUy note Aat instead of estimating e{ulj^lk]xi{^[k]} onlme usmg (30), a predetermined estimate 
of ^KmWuSwW} is sometimes used [25, 33]. In [25], tiiis estimate is derived from clean speech 
recordings measured during an mitial calibration phase. Additional recordmgs of the source speech signal 
aUow to produce an esthnate of tiie non-reverberant source speech signal instead of an estimate of die 
revedietant speech component in one of the microphone signals. However, since tiie room acoustics, the 
position of desired speaker and microphone characteristics may change over time, frequent re-caUbration 
IS required. In [33], a matiiematical estimate of tiie correlation matrix and tiie correlation vector of tiie 
non-ieverb«rant speech is exploited m which some signal model errors are taken into account 



this Section, ihe iHesent invraitiOT is described in detafl. 
In Section 3. the proposed adaptive multi-chamiel noise lednction technique, refisned to as Spatially 
Pre-processed Speech Distortion Weighted Multi-channel Wiener filter, is described 
.^e^°° ^'^ ^^"^ e«6odime«4 referred to as Speech Distortion Regularized GSC (SDR- 

GSQ. A new design criterion is developed for the adaptive stage of the GSC: the ANC design criterion is 
stroWtedwitharegularizationtennliatliniitsspeedi distortion d« IntheSDR- 
GSC.aparameter A* is inooipbrated that allows foratradcHjffbetween speech distort!^ 
Focussmg aU attention to noise reduction, results in the standard GSC. while, on the other hand, focussing 
all atertion towards speech distortion results in the output of the fixed beamformer. m noise scenarios with 
low SNR. adaptivity in the SDR^C can be easUy reduced or exchided by increasing attention towards 
.T^l'^cr"' ^ the parameter M to 0. The SDR-QSC is an alternative technique to 

flie QICGSC to decrease the sensitivity of the GSC to signal model errors such as microphone mismatch. 
teveAeration. .... Ih contrast to ti» QIC-GSC. die SDR-GSC shifts emphasis towards speech distortion 
wh«, the amount of speech leakage grows, m the abs«ice of signal model errors, the performance of the 

GSCspreserved. As a result, a bctternoise reduction i«ifomiance is obtained for small model errors ^e 
guaranteeing robustness against laige model errors. 

of^.l^i^^r^' Section 3.3. we fintiier in^rove tiie noise reduction perfomiance 

oftheSDR-GSCbyaddmganextniadaptivefilteringoperation woonthespeechreferencesi^ Werefer 
to tins generalized scheme as SpaHalfy Pre.prt>cessed Speech Distortion We^hted Multi-^Aannel Wiener 
i./... (SP^DW-MWIO. 11.e SP-SDW-N^ is depicted in Figure 3 and e^^^ 

^ Agam, a parameter ^ is incorporated in ti,e design criterion to allow for a tiade-off between speech 
distorton and noise reduction. Focussing all attention to speech distortion, results in die outout of tire fixed 
be^er. Also h«e, adaptivity can be easily reduced or excluded by decreasmg m to 0. It is shown 
&at -m fl.e absence of speech leakage and for infinitely loug filter lengtiis- the SP-SDW-MWF corresponds 
toa«Bcade of a SDR-QSC vrim a SDW-SWF postfilter In fl.e presence of speech leakage, tiie SP-SDW- 

^r,Zl2,T'°^'^""'"^"^°'^ compared toaSDR-OSCwidr SDW-SWF postfilter.a.e 
SP-SDW-NTWF tiien contams extra filtering operations tiiat compensate for tiie perfom^nce degradation of 

IheSDR-GSC witi, SDW-SWF duetospeech leakage. IncontrasttotiieSDR-GSCCandthusab^ 

^T^TT'"^^""'^'^'^^''''^ in [22. 27] recursive implementations of the 
(SDW-)MWF have been proposed based onaGSVDorQRdecomposition. A subband implementation [281 
resulte m unproved intelligibiUty at a significantiy lower cost compared to die fuUband approach. iLe 
lechmques can be extendedto implement die SDR-GSC and. more generally, the SP-SDW-MWF 

In a third ^nbodiment, described in Sectioft 4. we prepose cheap time^main andfrequency^main ■ 
stochcsuc gradient implementations of die SDR-GSC and SP-SDW-MWF. Starting from flie design crite- 
non of the SDR-GSC. or more generally. fl « SP-SDW-MWF. we derive a time-domain stochastic gradient 
'Tho implementatioa l>ased oa GSVD can only be used fcr the SP-StW-MWF iwfth filter w». 
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algonthm. In addition, we modify IbeLMS based algorithm [26] so that it appKes to the SP-SDW-MW^ lb 
mcieasecoiweigenceandreducecomplexity,afrequencyMiomainimplm^ Both, 
the stochastic gnuUeat and IMS based algorithm soflFer ftom a large excess enor when applied in highly 
toe..«tying noise scenarios. We show that the excess enor m the stodmstic gradient algorithm is reduced 
by applying a low pass mter to the part of the gradient estimate that liimts speech distortion He low 
pass ffltermg avoids a highly time-varying distortion of the desired q,eech component while not degradmg 
•the tracking perfomiance needed in tfane-varying noise scenarios. The stochastic gradient SP-SDW-MWF 
oulpeifomis the LMS based algorithm, while complexity is not mcreased. Experimental results show that 
the low pass filtering significantly inq,roves the performance of the stochastic gradient algorithm and does 
not compromise the tcacldng of changes in the noise scenario. In addition, experiments demonstiate that 
the proposed stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over QIC-GSC TTie 
bmited computational cost and the better noise reduction performance of the proposed algorithm make it a 
good alternative to the SPA [14] for implementation in hearing aids. 

3 SpatiaUy pre-processed SDW Multi-channel Wiener filter 
3.1 Concept 

fcfTnlt!^ ^ ^"^^ Pie-processed. Speech Distortion Weighted Multi-channel Wiener filter 
(SP-SDW-MWF). The SP-SDW-MWF consists of a fixed, sp^ pre-processor; i.e.. a fixed beamformer 

.l^i^ ""^P*^^ ^^'^ Weighted Multi.cham.el W«ner filter 

(SDW-MWF). Given Af microphone signals 

= «f [fc] -I- u?[*], i « 1, .... Af ^33) 

with utlk] the desired speech contribution and the noise contribution, fiie fixed beamfomier A.(z) 
creates a so-caUed speedi reference ^ ' 

yolk]=y^k] + y^lk], (34J 

by ste«ing a beacn towards the direction of the desired signal with a speech contribution yg[fc] and anoise 
conbibutionyJ[&]. In the sequel an endfire array is ass^^ 

at 0 To preserve the robustness advantage of the MWF, the fixed beam&imer A(..) should be designed 
so diat the distortion in the speech reference »g[fc] is minimal for all possible errors m the assmned signal 
model such as microphone mismatch. In the sequel, a delay-and-sum beamfonner is used. For small-sized 
anays, this beamformer ofers sufficient robustness agamst signal model errors as it mmimizes the white 
BOise gam ornoise sensitivity*. Given statistical knowledge about the signal model errors that occur m 
piBctice. a finlher optimized beamfomier A(z) can be designed. e.g.. usmg the techniques hi fan Tbc 



blocking matrix B(z) creates M-1 so-called noise xefeiences 

vAk] = yt[k\ + y^[k\, i = 1, .... M-1 



(35) 



by steering zeroes towards the ftont so that the noise contributions y^[k] are dominant compared to the 
speech leakage conlributions yl[k]. A simple technique to create the noise references consists of paiiwise 
subtracting the for 0» time-aligned microphone signals. Using [31, 34], finther optimized nois^ references 
can be created. Speech leakage can then be minimJ2sd for a specified angular region around 0" mstead of 
for 0»only, e.g., for an angular region ftom -20° to 20°. In addition, given statistical knowledge about the 
signal model errors that occur in practice, speech leakage can be minimized for all possible model enois by 
using [31]. ' 

Tn the sequel, the superscripts a and n are used to refer to the speech and noise contribution of a signal. 
During periods of speech + noise, the references yi[k], i = 0, .... M - 1 contain speech + noise. During 
periods of noise only, Vi[k], i = 0, .... M - 1 only consist of anoisi component, i.e., yi[k] = y^[k]. The 
second order statistics of the noise signal are assumed to be quite stationary such that they can be 'estimated 
during periods of noise only. 

The SDW-MWF filter^ vtfkM-x 



Wo:M-i = Qg{yS:Af-iySS-i[*:]} + g{yg:^,iyg,^_^[fc]})"' S{!^^,^y^*[k - A)}, 



with 



(36) 

(37) 
(38) 
(39) 

(4P) 

provides an estimate vi^_^yo.,M-ik\ of the noise contribution yg(fc - A]6 m the speech reference by 
minimizi n g ibs cost funcdon J(wo:i(f-i) 



= [^[k] w«[A:] ... wS_Jfc]]i 

w<[&] = [«,(oj w[x] ... vi[L-l\Y 

y^.M-x{k] = [y?[fc] yf(A] ... y§_i[fej ] , 

yi{k\ = [vi[k] yi[k-l] ... + J'', 



^(W0;M-X) = \ M-^^M-xik\yi:M-x[k\?} + S{\y^\k - A] - ^. M.l{k]y$:M.Ak\fy • 

4 a. 



(41) 



»InatiiM.domainiBq,!«n«tation^ are real and hence. w?« . = 

iStem^t? ^ """^ *^ inp« signal, so a»t fl«y «i also be apph^?^lS«d 



The subscnpt 0 : M - 1 in Wo:m-i and yo:i^_x lefeis to the subscripts of the fiist and last channel 
component of the adaptive filter and input vector, respectively. The tenn represents the speech distortion 
eneigy and 4 the residual noise energy. The term 1^ in the cost fimction (41) limits the possible amount 
of speech distortion at the output of the SP^W-MWF. Hence, the SP-SDW-MWF adds robustness against 
signal model errors to the GSC by taking speech distortion explicitly into accomit in the design criterion of 
the adaptive stage. The parameter i 6 [0. oo) trades off between noise reduction and speech distortion: the 

la^j.thesmaUer the amount of possiljle speech distortion. ForM = 0. the output of &e fixed beamfomier 
A(«), delayed by A samples Is obtained. In noise sceqarios with very low Signal-to-Noise Ratio (SNR) 

2'cni?x™ ^ P«^d. Ad^tivity can be easUy reduced or excluded in th^ 

SP-SDW-MWF by decreasing M too. Alternatively, adaplivity can be limited by applying a QIC to wo-j^ , 
Note that when the fixed beamfoimer A(.8) and the blocking matrix B(«) are set to 



A(^) 



B(z) 



[l 0 ... o]^ 



0 1 

0 ••. 



0 10 
0 0 1 



(42) 



(43) 



weobtain the original SDW-MWFfliat operates on the received microphone signals «cffcj. « = 1 M 
Below, the different parameter settmgs of the SP-SDW-MWF are discussed. Depending onthi seLg of 

the panmieterMand presence or absence of the ^Iterwo, the GSC. the (SDW-)MWF as well as in-between 
sohitions such as the Speech Distortion Reguhafaed GSC (SDR-GSC) may be obtained. We distinguish 
be^een two cases. i.e.. the case where no filter Wo is appUed to the speech reference (filter length L. - 0) 
and the case where an additional filter wq is used (Lq 9^ 0). 

The adaptive stage of the SP-SDW-MWF can be implemented using the recursive QRD-based imple- 
mentation of the SDW-MWF [22]. Like for the SDW-MWF. complexity can be reduced by a subtend 
miplementaUon [23]. For A, ^ 0. also the GSVD based algorithm [20] can be ,q,pKed. Cheaper stochastic 
gradient based algorithms are proposed in Section 4. 

3^ First embodiment: SD&-GSC, Le., SP-SDW-MWF without wq 

First, consider the case y^tOumt wo. i.e. Xo = 0. Ihe solution for wx,^_i in (36) then reduces to 



"a <a 



(44) 



leading to 



(45) 



■where ^ is the speech distortion energy and the residual noise energy. 

JtemarkForLo^O. it is reaiSly seen that aV does not hold, le.. wi^^-i + wi:m-i # e^whei^ 

^i:M-x = (£{y!:A^-iySS_x} + A^{y?,*f-xy5Vi^_ j) e{yi^.^ynk - aj}, (46) 

6ecfl«.c /Aa ;5pe.cA component y!,j»,_ Jfc] in the input to the ad^Uve filter vr^M-xdoes not contain the 
estimated speech sipial — A]. 

If M = 1, flie classical MMSE (aiterion (cfr. (17)) is obtained. 

Compared to die optiinization criterion (6) of die GSC. a regularization term 

^^{|^-iy!..M-i[*]r} (47) 
has been added This leguhirization term Ihnits the amount of speech di^ 

wi:„_i when speech leaks into the noise references, ie.. yflk] ^ 0. i = 1, .... M - 1. In the sequel we 
therefore referto theSP-SDW-MWF withLo = Oas5pe«cAi?&tortfo«J?*5g«farfe^ 'tj^ 
analler /*. the pmaUer the resulting amount of speech distortion will be. For ^ = 0. the output of the fixed 

beamfonner A(r)delayedlqrAsample8. is obtame4ForA» = oo.aU emphasis isputonnoisere^^^ 

^e^hdistortionisnottakenrnto account This corresponds to the GSCHence, the SDR^SCencompa^ 

the GSC as a special case. 

The regularization term {|w?^_i[*]yf,^_i[&]p} with A ,6 0 adds robustness to the GSC. while 
not aflfectmg the noise reduction performance hi the absence of speech leakage. 

. hithe«feenceo/^««A leakage, i.e., j^ffc] = 0, i = 1, .... M-1. the regularization term equals 0 
^ all wi,i^_i and hence the residual noise energy 4 is efiFectively mmimized. In other words, in the 
absmce of speech leakage, the GSC sohitian is obtained. 

. hi die presence of speech leakage. i.e., / 0, * - 1 M-1, speech distortion is taken 

mto account m the optimization criterion (44) for die adaptive filter w, limiting speech distortion 
plus reducmg noise. The larger die amount of speech leakage, die more attention is paid to speech 

distortion. . ^ 

To Imiit speech distortion alternatively, aQICis often unposed on die filter wx.M-i (see Section 1 2) 
hi contrast to die SDR-GSC. die QIC acts ixiespeotive of die amount of speech leakage y-ffcl diai is 
present The constramt vahie m (1 1) has to be chosen based on die largest model enors diat may 
occur; As a consequence, noise reduction performance is conqiromised even when no or very small 



17 



model errors are present Hence, the QIC is more conservative than the SDR^GSC The experimental 
results in Section 3.4 confirm this. 



33 Second embodiment; SP-SDW-MWF with filter wq 

Since the SDW-MWF (36) takes speech distortion expliciUy mto account in its optimization criterion, an 
additional filtering wq on the speech reference y^\k\ may be added. The SDW-MWF (36) then solves the 
follovving more general optimization criterion 



wo:Af-i =» arg min 



(48) 



'^c^ ''Sftf-i = [wf wgj,_ J is given by ^Q. 

Again. M trades offspeech distortion andnwe reduction. For ^ = 00. speech distortion^ is completely 
ignored so that tiie solution becomes 



(49) 

^»*ioh results in a zero output signal. For ^ = 0. aU attention is paid to speech distortion so that the output 
of the fixed beamfinmer delayed by A samples, is obtained. 

. lii the absence cf speech Uahase. Le.. yt[k\ = 0 for i = 1, .... M - 1. and for infinitely long filters 
w,, ^ = 0, M - 1. the SP-SDW-MWF with wo corresponds 1» the cascade of a SDR-QSC and a 
SDW Single-channel WF (SDW-SWF)postfilter [30, 35]. 

Proof: In case of mfinite filter lengths, the SP-SDW-MWF Wo:m-i(/) and its optimization 
cntenon can be represented in the firequency-domam: 

Wou«_,(/) = arg^min £ 1 1 [(exp(-i2./A) - W^(/)) -W^^.,(/)] [^5^] [ j 



Without loss of generaUty, we assume -for reasons of simplicity. A = 0. 
Decompose Wum-i(J) as 

Wuulxif) = (1 - Wb(/)) Wa.i,Af_x(/) (51) 

with Wo(p a single-cham^l and W^,,^^_,(J) a multi-channel filter and define an intermediate 
oulput V{f) (see also Figure 4) as 

V{f) = YoU) - •W^i:M-x(muu.iif). (52) 
Then, the cost fimction JiWcW^vM-i) of (50) canbe re-written as 

J ' f {1(1 - WSif)) V»(/)P} + if { + W^,^_,(/)Yf^_,(/)|^} . (53) 
From ^JiWo, Wi,uM-i) = 0. we find 

This single-chamdJUta- Wo(f) consists of two terms. 

- Thefirsttenn 

w^.r(/) =.(eiv^v^*} + j^£{v'v''*}j e{v^\r^} (55) 

estimates the noise component y»(/) m the intermediate output V(f). The filterl - Wb i cor- 
responds to a SDW Single-channel Wiener Filter (SDW-SWF) that estimates the speech c^po- 

— The second tenn 

WoM = (^{v-y-} + " (-i^{v-n^.,w^,^_,}) (56) 

estimates the speech leakage filtered by W,,i,^.,(/). i.e.. -W^,,^_,YJ.^_,. The speech 
^ponent mthe mtemiediate output equals = l^f - W^,,^_,Yf.^_,. The filter 
WoM) tnes to compensate for the distortion -W?^^_,Yl^_^ by adding an estimate of 
^<i.i:M~i^i:M-i ^ tile ou^mt of tile SDW-SWF. 

In the absence of speech leakage Ci.e.. Yf,^_, = 0), fixe fitter Wo,2{f) equals zero and 1 - Wo(f) 
corresponds to a SDW-SWF. ^ ^ 



V'!7^:n::i'^O^o,Wd,UM-i) =» 0, we obtain flie following sohitioa for Wd.i,Ar-i(/): 

(^{YSm-xW - leiYt.M.^yo'-J^}) . (57) 
Also the muia-channelfilta- Wa^M^i^f) consists of two tenns. 

- ThefiisttemicanespDiidstotiieSDRGSC 

(«{Y?:M-xYj/^_x} + l£{Yt^-{Yt^.S)~' W,i^-ai^'n (58) 

and estimates the noise conqKment 1^(/) at the aatpvit of the fixed beamfoanei: 

- The second tenn tries to compensate for fee speech distortion -W$(f)Y^(f) caused by Wb(/) 
by adding an estimate of -^^yof^f) to the output of the SDR-GSC. Note that this cone- 
sponds to adding an estunate of W^o*(/)io (/) to the output Z{f) of the SP-SDW-MWF. 

In the absence of speech leakage, Wd,i.M-i(f) corresponds to a SDR-GSC or a GSC. 
Figure 4 ilhistrates gcaphicaUy the solution for W^,M-iif) and Wo(/) for A = 0. In the absence of 
speech leakage, the filters Aat tty to conqiensate for the ^eech distortion equal 0, hence, the SP-SDW- 
MWF coiresponds to a SDR-GSC (or GSC) with SDW-SWF postfilter. The SP-SDW-MWF achieves 
the same or a better Signal-to-Noise Ratio (SNR) hnprovement than the SDR-GSC, dependmg on the 
noise scenario. ^ 

3.4 Experimental results 

This Section iUustiates the theoretical results of Section 3.2 and Section 3.3 by means of expetimeotal 
results for a hearing aid application. Section 3.4.1 and Section 3.4.2. respectively, describe the set-up and 
die peiformance measures that are used, Section 3.4.3, the fanpact of the different parameter settmgs of 
the SP-SDW-MWF on the performance and the sensitivity to signal model errors is evahiated. Comparison 
is made with the QIC-GSC. 

3A.1 Set-up 

A Ifaiee-microphone Behind-The-Ear (BTE) hearing aid with three omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an office room. Tbe faiterspacmg d between the first and 
the second microphone is about d = 1 cm and the mterspacmg between the second and third microphone 
about 1.5 cm. The reverberation time Tboob is about 700 ms for a speech weighted noise. The desired 
speech signal and the noise signals are uncorrelated. Both the speech and the noise signal have a level of 



70 dB SPL at the center of the head The desired speech source and noise sources are positioned at a distance 
of 1 meter ftom the head: the speech source in ftont of the head, the noise sources at an angle B w.T.t. the 
speech source. To get an idea of the average perfonnance based on directivity only, stationary speech and 
noise signals with the same, average lons^erm power specbal density are used. The signals can be found 
on [36]. The total duration of the input signal is 10 seconds of which 5 seconds contains noise only and 5 
seconds contain both the speech and noise signal. For evahiation purposes, the speech and noise signal have 
been recorded separately. 

The microphone signals are pre-whitened prior to processing to improve intelligibilily [37], and the 
omput is accordingly de-whitened. In the experiments, the microphones hove been calibrated by means of 
recordings of an anechoic speech weighted noise signal positioned at 0° measured while &e microphone 
array was mounted on the head. A delay-and-sum beamformer is used as a fixed beamfonner, since -in case 
of smaU nucrophone intetspacmg - it is robust to model errors. The blocking matrix B pairwise subtracts 
the tiniB aligned cahlxrated noicrophaie signals. 

To mvestigate the effect of the different parameter settmgs (i.e. wo) on die performance only, the 
filter coefficients are computed usmg (36) where f{yS:M-xyS;S-J estimated by means of the clean 
speech contributions of the microphone signals. In practice, e{yi..M~xy'i^-y} is approximated usfag (30). 
The effect of approxmudon (30) on the performance was found to be smaD (ie. differences of at most 
0.5 dB in intelligibility weighted Sigoal-to-Noise ratio hnprovement) for flie giv«i data set The QIC-GSC 
is implemented using variable loadmg RLS [19]. The filter length L per channel equals 96. 

3.4.2 Performance measures 

To assess the performance of the different approaches, the broadband mtelUgibiKty weighted signal-to-noise 
ratio inqtrovemeot [38] is used, defined as 



(59) 



where the band importance function /< expresses the importance of the i-th one-third octave band with 
center frequency ft for mteffigibility, SNR<.out is the output SNR (in dB) and SNR<.i„ is the input SNR 
(m dB) m the i-lh one third octave band. The center firequencies /? and the values U are defined in [39]. 
The intelligibilily weighted signal-to-noise ratio reflects how much mtelUgibility is unproved by the noise 
reduction algorithms, but does not take into account speech distortion. 

To measure the amount of speech distortion, we define dip foUovidng mteUigibiliiy weighted spectral 
distortion measure 

SDfaten^ = ^JiSDi (60) 



with SDi the average spectral distortioii (dB) in i-th one-lhiid band, measmed as 

^""^ ^ Li-^ft |10Iogu,G«(/)| cif/ [(2V6 _ 2-1/6) ^c] ^ 

with the power transfer function of speech fiom the input to the output of the noise reduction algo- 
li^. 

To exclude the effect of the spatial pre-piocessor, the performance measures are calculated wxt the 
ou^t of the fixed beamformer. 

3 A3 Experimental results 

The impact of the different parameter settings for m and wo on the performance of the SMDW-MWF is il- 
lustrated for a five noise source scenario. Thefivenoisesourcesaiepo8itionedatangles75», 120», 180", 240» 
285- w.r.t the desired source at 0". To assess the sensitivity of the algorithm against errors in the assLsA ' 
signal model, the mfluence of microphone mismatch. e.g.. gam mismatch of the second microphone, on 
the performance is depicted Among the different possible signal model errors, miciophone mismatch was 
found to be especially harmful to the performance of the GSC m a hearing aid ai»plication[17]. In hear^ 
ing aids, microphones are rarely matched in gam and phase. In [31, gain and phase difibrences between 
mioophone characteristics of up to 6 dB and 10°, respectively, have been reported. 

SP-SDW-MWF wttfaont wq (SDR-GSQ 

Figure 5 plots the hnprovement ASm^ui^ and die speech distortion SlWig as a function of 1 obtahied 
by the SDR-GSC (i.e.. the SP-SDW-MWF without filter wp) for diiFcrmt gain mismatches T2 at flie second 
nncrophane. In the absence of microphone mismatch, the amount of speech leakage into die noise references 
IS Umited. Hence, the amount of speech distortion is low for all ^t. Since there is still a smaU amount of 
speech leakage due to reverberation, the amount of noise reduction and speech distortion slightly decreases 
for increasmg i .especially for i > 1. /n the presence of microphone mismatch, the amount of speech 
leakage into the noise references grows. For i = 0 (GSC), the speech gets significantly distorted. Due to 
the canceUationofthedesiredsigiial. also the improvement ASNRi^uig degrades. Setting i > 0. nnproves 
the performance of the GSC in the presence of model enrors without compmmismg performance m the 
absence of signal model errors. " 

SP-SDW-MWF wtfli fitter Wo 

Figure 6 plots the performance measures ASNRirtdiig and SDtaeUig of the SPSDW-MWF with filter wq. 
hi general, the amount of speech distortion and noise reduction grows for decreasing K For - 00 
all attention is paid to noise reduction. As also iUustrated by Figure 6. this results in a total cancellation* 
of the speech and the noise signal and hence degraded performance. In the absence of model errors, the 



settmgs Xo = 0 and £0 94 0 result - except for 1 = 0 - in the sanie ASNRj^^ni, \ while the distortion 
forthe SP^DW-MWF with wq is higher due to the additional single-channel SDW-MWF. ForXo 96 0, the 
pafonnance does -in contrast to jLo = 0 - not degrade due to the microphone nii^ 

Comparison with QIC 

Figure 7 depicts the hj^royement ASNRi«eiilg and the speech distortion SDj^, respectively, of the QIC- 
GSC as a function of 0^. Lite the SDR-GSC. the QlOmcreases the robustness of the GSC. The QIC is 
independent of the amount of speech leakage. As a consequence, distortion grows fast with mcreasing gam 
deviation. The constraint vahie 0 should be chosen so that the maxunum permissible speech distortion level 
isnotexceededferthe largest possible model eoors. This goes at the expense ofreducedmrise reduction for 
small model errors. The SDR-GSC on the other hand, teeps the speech di8t(«ti<mlhnitedforaU model errors 
(see Figure 5). Attention towards speech distortion is mcreased if the amount of speech leakage grows As a 
result, a better noise reduction performance is obtained for smaU model errors, while guaranteeing sufficient 
robustness for large model enon. Inadditioo, Figure 6 demonstrates that an additional fflterwo significantly 
nnprovBs the performance of the SP-SDW-MWF in the presence of signal model errors. 

33 Condnsion 

hi the present mvention. we established a generalized noise reduction scheme, referred to as Spatially pre- 
processed, ^ech Distortion Weighted Muia-chanml mener filter (SPSDW-MFfT), that consists of a fixed 
spatial pre-processor and an adaptive stage ftat is based on a SDW-MWF. The new scheme encompasses tfa^ 
GSC and MWF as special cases, hi addition, it allows for an m-between solution that can be interpreted as a 
Speech Distortion Regularized GSC. Dependmg on the setting of a Wid^S parameter m and the presence 
or absence t^tkefiltervro on ihe speech r^erence, the GSC, the SDR-GSC or a (SDW-)MWF is obtamed. 

In Section 3.2 and Section 3.3, the different parameter settings of the SP-SDW-MWF have been mter- 
preted. 

• Without wo. the SP-SDW-MWF corresponds to a SDR-GSC: the ANC design criterion is supple- 
mented with a regularization term that Umits die speech distortion due to signal model errors The 
teger A 4e smaller the amoum of distortion. For i = 0, distortion is ignored completely, which 
corresponds to &e GSC-sohition. The SDR-GSC is then an altemative technique to the QIC-GSC to 
decrease the sensitivity of the GSC to signal model errors. In contrast to the QIC-GSC, SDR-GSC 
shifts emphasis towards speech distortion when the amount of speech leakage grows. In the absence 
of signal model errors, the performance of the GSC is preserved. As a result, a better noise reduction 
performance is obtamed for small model errtas. ^le guaranteemg robustness against large model 
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^For to ^ 0, the SNR improvement was laiger Ounks to the sin^e dunnel SDW MWF Dostfilter f see SectioTi ^ r\ v^r 
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• Since the SP-SDW-MWF takes speech distortion explicitly into account, a filter wq on Oe speech 
tefereace can be added. It is shown that -in the absence of speech leakage and for infinitely long filter 
lengths- the SP-SDW-MWF coiresponds to a cascade of a SDR-GSC with a SX>W-SWF postfilter. 
M the presence of ^eecfa leakage^ the SP-SDW-MWF with wo tries to preserve its performance: 
compared to a SDR-GSC with SDW-SWF postfilter, the SP-SDW-MWF then cantains extra filtering 
operations that conqiensate for the performance degradation of the SDR-GSC with SDW-SWF due to 
speech leakage. In contrast to the SDR-GSC (and thus also the GSC). performance does not degrade 
due to midophone 

In Section 3.4, experimental results for a hearing aid appKcation confinned the theoretical results of Sec- 
tion 32 and Section 3.3. The SP-SDW-MWF mdeed increases the robustness of the GSC against signal 
model enors. Comparison with the widely studied QIC-GSC demonslzated that the SP-SDW-MWF achieves 
a better noise reAiction performance for a given maxhnum allowable speech distartian level 

4 Third embodiment: Stochastic gradient implementations 

In [22, 27] recursive inq)lementations of the MWF have been proposed based on a GSVD or QR decom- 
position. A subband hnplementation (28] results in fanproved intelligibility at a significantly lower cost 
compared to the fiillband approach. These techniques can be extended to implement the SP-SDW-MWF 
However, in CQnUast to the GSC and the QIC-GSC [14], no cheap stochastic gradient based implementation 
oftbe SP-SDW-MWF isavailable. In [25].anLMSbasedalgQrifhmfiirlheMWFhas been developed. The 
algorithm needs recordings of calibration signals. Since room acoustics, microphone characteristics and die 
location of the desired speaker change over time, fiequent re-cahTjration is required, making tiiis approach 
cumbersome and expensive. In 126], an LMS based SDW-MWF has been proposed that avoids the need for 
calibration signals. The algorithm however relies on some independence assumptions tiiat are notnecessai^ 
Uy satisfied. In the preseit iimaiti<m, we pn^ose time-domain and fiequenoy-domain stochastic gradient 
hnplementations of the SP-SDW-MWF tiiat preserve die benefit of matrix-based SP-SDW-MWF over QIC- 
GSC. The LMS based SDW-MWF of (26] is modified so that it ^lies to the SP-SDW-MWF scheme. Li 
addition, other stochastic gradient algoritiuns are developed tiiat achieve a better performance. Experimental 
results demonstrate that the proposed stochastic gradient inq>Iementation of flie SP-SDW-MWF ou^erfonns 
the SPA, while its conqpmational cost is Umited. 

This section is organized as follows. Starting from the cost fiaiction of the SP-SDW-MWF, a time- 
domain stochastic gradient algorithm is derived in Section 4.1. Applying the mdependence assiimptions 
made m [26] results in an LMS based SP-SDW-MWF similar to [26]. To increase convergence and reduce 
compfcxity, the stochastic gradient and LMS based algoriflmi are implemented in die ftequency-domaia 
Both, the stochastic padient and LMS based algoritimi suffer ft«m a large excess error, when applied m 
highly time-varying noise scenarios. In Section 4.2. we show that tiie performance of tiie stochastic gradient 
algorithm is improved by applying a low pass filter to tiie part of tiie gradient estimate tiiat hmhs speech 



distortion. The low pass filtering avoids a highly time-varying distortion of fee desiied speech component 
while not degrading the tracking perfonnance needed in time-varying noise scenarios. Section 4.3 compares 
the perfonnance of the different frequency-domain stochastic gradient algorithms. Experimental results 
show that the proposed stochastic gradient algoridna preserves die bemefit of the SP-SDW-MWF over the 
QIC-GSC. 

4.1 Stochastic gradient algorithm 
4.1.1 Derivation 

A stochastic gradient algorithm ^proximates fee steepest descent algorithm, usmg an instantaneous gradient 
estimate. Given fee cost function (41), fee steepest descent algorithm iterates as follows* ' 



w[n + l] ^ wW-^ff-^) 

= w[nj + p (e{y^y^''[k - AJ} - £iy^y^'«[k]}w[n] - i^{y*y-'*(&]}wln]) , (62) 

wife w[fc], y[k] e C^^xi, where AT denotes the number of input csbannels to fee adaptive fUter and L fee 
number of filter taps per channel Replacing fee iteration index n by a time index k and leaving out the 
expectation vahies £{.}, we obtain fee followiog update equation 



w[fc + lj = w[fc]-|-p^ 


y"tfc](l^n* - A] - y"-*[*Iw[*J) - iy'y'-»[AM*] 











(63) 



For jj = 0 and no filtering wq on fee speech reference, equation (63) reduces to fee update formula used in 
GSC duringperiods of noise only (i.e., when »,[fc] = y?[fc], » = o, .... M - 1). The additional term rt*) in 
fee gradient estimate limits fee speech distortion due to possible signal model errors. 

Equation (63) reqmres knowledge of fee correlation matrix y^y^'"[k] or f {y'y^-^tA:)} of fee clean 
speech. In practice, this information is not available. To avoid fee need for calibration, speech + noise 
signal vectors y^^f, are stored into a drcuhir buffer Bi 6 R^" during piocessmg aB in [26]. During 
periods of noise only (i.e.. when ^afe] = J^{A:], € = 0, M-1), the filter wis updated using fee followmg 
^proximation of fee term r[*;] = V*l*:]w(fcl in (63) 

^y-y'.^[A:lw[fc] « i (ytuf^vSLMk] - yy^[k]) w[&], (64) 



•to^sequd the sabsaipts 0 : M - 1 in the adaptive iUter wo:m-x and the input vector yo,«-iare omitted for the sake of 



conciseness. 



;5 



This results in file update fiKmoIa 



w[fc + l] = w[fc]+p< 



t , . 



(65) 



duiingperiodsctf noise only. Iafhesequd,anannatizedstepsizepisi&ed,i.e., 

\»*ere 5 is a veiy small ccnistant The absolute value \ygj:^yi„,f^ - y*y| has been insated to guarantee 
a positive vahied estimate of the dean speech eneigy y'-^y^ps]. Additional storage of noise only vectois 
yfcu/, € C^^'<i m a second buffer Bj e R**"^*"/. aUows to adi^ w also during periods of speedi + noisQ, 
using 

w[* + IJ = w[fc] + p ^VbuMMf^lk - AJ - y£,,w[fc]) + 1 (y6„/,y^;.,[fc] - yy^lfe]) w[fc]| (67) 
with 

In the sequel, we will - for reasons of conciseness- only consider the update proceduie of flie time-domain 
stochastic gradient algorithms during noise only, hence, y[A;] = y«[jfc]. The extension towards updating 
during speech + noise periods with the use of a second, noise only buffer B2 is straightforward: the equations 
are found by replacing the noise-only, input vectors y[k] by ytu/a [A;] and the speech + noise vectois yf^f^ [k] 
by the input speech + noise vector y[k]. 
Usmg' 

= ( J^{yfr«Ay£/J + (1 - ^)^{yy^}) ^ s{y^yilk - a]}, (69) 

where y is a noise-only vector, and (65) it can be shown fliat 



(70) 



Hence, the algorithm (65)-(67) is convergent in the mean provided that the st^ size /> is smaller than ^ 
with A„^ the maximum eigenvalue of ^{py&u/iy£/, + (1 - ^)yy^}. The similarity of (65) with standard 
NLMS let us presume that setting p < ^j^. with A,-, i == 1, NL the eigenvalues ot£{j^yi^f^yg^^ + 

^When the second order statistics of the noise are shoit-tenn stationary, Wope equals to (36). 



(1 - ^)yy^} 6 R^i>«'^i, or -in case of FIR filters- setting 

2 



9 < 



guarantees ccmvergence in tiie mean square. Equation (71) oqplains the nimnalization (66) and (68) for fbs 
step size p. 

However, since generally 

yy^[ft]?^yR./;yLfj&l. (72) 

the instantaneous gradient estimate in (65) is -compaiBd to (63> additionally perturbed by 

^ (yy^W - yLf,y^[k]) w[Ai, (73) 

fiM- 1* # oo. Hence, for ^ oo, the update equation (65H67) suffers from a larger residual excess error 
than (63). The additional excess error grows for decreasing fi, increasmg stq» size p and increasing vector 
length L.N of the vector y with L the filter lenglh per channel and JV the number of inputs to the adaptive 
filter. It is expected to be especially large for highly time-varying noise, e.g.. multi-talker babble noise. 

4.1.2 NLMS based algoiitlim 

In [261. an LMS based implementation of the SDW-MWF has been proposed. Besides (64). some additional, 
independence assumptions are made. Applying these assumptions to (65H67), results in an LMS based 
implementation of the SP-SDW-MWF similar to [26]. Assuming &at 



~iyimfx[k]yi[k-A\ = 0 (74) 



m) w*ly^A^*J+y^u/^^fc]y*[fcl) = o. 

hold, with and 2 different time instants, (65) can be smq>lified to 



w[fc -t-l] wife] + ^^^^jj^^^j ^ ^ x[fc](d*[A:l - xg[fe]wW) 



(75) 



(76) 



whioe 



during periods of noise only (i.e.. y\k\ = y»t*]). During speech + noise (i.e., y[*] = y»ffcl + y^ffcl), m 
and x[fc] in (76) are set to 

. m = «>^aI* - A]-^^;xtA] = ^l^yi„f,[k] + yiy[A]. (78) 

Equations (74) and (75) assume that - besides speech and noise vectors - also noise vectors at different 
time i nstants are m utaally unoonelated. In practice. (74) and (75) do notrhold, espedaUy far large 

\l^{^~h)' 1. Hence, compared to (65H67). performance is expected to be worse. 

In addition, equations (76K78) can - in contrast to (65> not be applied for /* < 1. Compared to (65) no 
significant complexity reduction is achieved. The LMS based updating (76) requires 4NL + 3 Multqily- 
Accumulate (MAC) per sample^o, whereas update fomiula (65) requires (4NL + 5) MAC per sample. Ihe 
computation of the normalized step size in (76) requires + 2 less MAC per sample than in (65). 

4.13 Frequency-doniBin implementation 

As stated before, the stochastic gradient algorithms (65>(67) and (76) are expected to suffer fom a large 
excess error for large ^ and/or highly time-vaRong noise, due to a large difference between the rank.K)ne 
noise correlation matrices y^[k] measured at different time instants fc. The gradient estimate can be 
in^iroved by replacing 

ytuAyte/.M-yy*[A:J . (79) 

in (65) with the tim&-a:verage 

1 ^ 1 

where ^ YjLh-K+i ywiyfai/i W ^ updated during periods of speech + noise and Efc=jfe_K+i yy^W 
during periods of noise only. However this would requhe expensive matrix operations. A block-based 
implementation intrinsicaJly peifoxms this averaging: 



Li=0 

■ Y, hfb^fxikK + ilygy^ [kK + i] - y{kK + (\y^[kK + {[) ^[kK\ . 



(81) 



dut the output vo[A; - A] - w*yCk] of the algoritfam stiU has to be con^jutei 



The gradieat and hence also y6„Ayg;^Jfc]-yy«[fc] is averaged oiwK iterations ^rior to nutoadjustmenls 
to w. This goes at the expense of a reduced (i.e. by a fector K) conveigence rate. 

n» block-based implementation is computationally more efficient when it is implemented in the frequency 
domain, especially fer laige filter lengOs. In addition, in a frequency-domain implementation, each fre- 
quency bin gets its own step size, resulting in fester convergence compared to a time^Iomain implfonenta- 
tion while not degrading the time-domain MSB. Although the frequency and time-domain implementation 
obtam the same MSB, the improvement in SNRi^,ug, which is detemiined by the excess eirois m each 
frequency bm. may be diflferenL fa a Ume^domdn mplementation. one common step size p is used fi>r the 
different frequency bms. The convergence rate depends on the eigem^ spread of the correlation matrix of 
the mput signals to file adaptive filter and hence onthepower spectrum of the input signal Infiequency bins 
with htae power ftis common step size will be smaller than inihe frequency-domain approach, resulting in 
slower convergence and less excess enor in fliat bm. In frequency bins with large power on the other hand, 
this common step size will be larger than in the frequency-domain aw«oach. resulting in larger LMS ex- 
cess error m that frequency bin. Hence, in a time-domain implementation, the power spectrum of the input 
signals not only detemiines the convergence rate but also the improvement ASNRtadiig. In ,x frequency, 
^main implementation, the step size is normalized in each frequency bin. so that tiie different bins have 

a smuiar convergence rate and hence also excess erroc Hence, the SNR improvement in each fi;eq^ 
bm IS more controlled (i.e. less dependent on the mput power spectrudi). Since signal model eirora (e g 
microphone mismatch) modify the power spectrum of the noise references and hence, the convergence rate' 
and mqnovement ASNRi^g of a time-domain implementation, frequency-domain implementations are 
more appropriate to evaluate the perfiirmance of the algorithms fia different signal model errors. 

Algorithm 1 and Algorithm 2 summarize a frequency-domain implementation based on overiap-save 
of (65H67) and (76). respectively. Algorithm 1 requires (3iV + 4) FFTs of length 2L and algorithm 2 
{3N + 3) rers. By storing the FFPtonsfiamed speech + noise and noise-only vectors in the buffera" 
Bi 6 C X ^fr and B2 € C^'^W.. respectively, instead of storing the timeniomam vectors N FFT 
operations have been saved. When adapting during speech +noi5e. also the time-domain vector 



[ttot&i-A] ... itolfcL-A-l-i-1] ]' 



(82) 



^ould then be stored in an additional buffer B^.o € R'-^ during periods of noise-only, which -fi,r 
N=^M. results in an additional storage of .words compared to when the time-domain vectors are 
stared into the buffera Bi and B2. 

Remark ; In algorithm I and 2 a common tradeoff parameter ^ is used in aU fi^quency bins. Altemor 
tively. a different setting for n can be used in different frequency bins. E.g. fr,r SP^SDW-MWF mth wo = O 
fi could be set to oo at those frequencies M>k ere the GSC is sufficiently robust, e.^.. for small-sized arrays Lt 



Algorithm 1 Frequency domain stochastic gradient SP-SD W-MWF based on overlao-save. ' 
Initialization: ""^ — ^ — 

PmlO] = 5m, m = 0, 2L - 1 
Matrix definitions: 

6=[J^ Ol]'^'*!® Il ];F = 21, x2ZrDFT matrix 
For each new block of NL inpnt samples: 

• If noise detected: 

1. "ElvilkL-L] ... yi[fcZr + X-.l] i = Af-iV, M- 1 ^ noise bufferBa 
[yo[kL^£i] ... j/o[fcL- A + ]^ -incise buffer B2.0 

2. Y*«tfcI-diag{F[yi[/:i^X] ... + 1) f } , < = ilf ^ iV, Jlf - i 

Y4fcl=diag{[ BiftO) ... B,(<,2i;^l) ]n,i=.M^JV,...,M-.l 
cycUcaUy shift each row t of Bi over 2L map\Bs^ i^M - ,.,,M 
d[A!j=.[0 0 yolkL-'A] "' imlfci - A + i - 1 ]] 

• ^speech detected: 

1. F[yi[kL^L] ... 2„[fex; + X,-lJ ]^ . i = M -iST, .... Af - 1 speech+noisebufferBi 

2. Yi-[fc]«diag{[Ba(i.O) ... Ba(<,2L-l) ]^},ir=M-i^,...,M-l 
cyclically shift each row i of B2 over 2i samples, i = M — iV, M — 1 
Yi[fc]=diag(F[ yi[fcX,--Z,J ... + L - 1] ]^} , * = Af « iV, .... M - 1 

dtfc]«[0 ... 0 Ba,o(1.0) ... Bji,o(l,i^-l) r 
cyclically shift 62,0 over L samples 

• Update formula: 

e[fc] = d[fc]-ei[fc| 

e,[fc] = kP-» Yj[fc]W;[fcI = y™., 

2. At*] = Sgldiag {Pi-'l&I. .... 

3. W,l* + H = W4fcl + FgP-»A[*j {Yr(fcIB-l*] - i (YiE5[fcl - YrEI(AJ)} ,i = M-N, M - 1 

• Ou^m: yoIA:) = [ yo[kL - A] ... vo[fcZ/ - A + X, - IJ 

- If noise detected: y««[*!l = yo[fcJ - yo«,i[*l 

- If speech detected: yoa[fcl = yo|fc] - yo«.a(A;J 




Algorithm 2 Fr equency domain NLMS based SP-SDW-M WF based on «ver Wav^ 

Inittalizatlon: " — — ^ 

^m[0] = <$m , m = 0» 2L - 1 
Matrix definitions: 

^^[J^ OiJ'^^t^ lii ];F = 2IrX 2LDFT matrix 
For each new block of NL Input samples: 

• If noise detected: 

[yolkt-A] ... Vo(fci-A + i«l] ]^-^noisebufferB2,o 
2. Y4fcl = diag {f [ y,lkL^L] . y^lAL + i- 1] ]^} . i « AT, Af - 1 
Yi,6u/xtA:] = diag{[Bi(«,0) ... Ui(i,2L I) ]^} , i ^ M N, M ^ 1 
Xi[k] = y/T^Yilk] + ^Y,.6.,afcl. < Af - iV, .... M - 1 
cyclically shift each row i of bnffer Bi over 2L samples 

• If speedi detected: 

1. ^[yAkL^L] ... yi(Ari + L-ll ]^,i:=.M->iV,....M-l->speech+noisebufiferBx 

2. Y,W=diag{F[y,[*X-i;] ... y,[feL + Z-.ll ]^},t^Af-iNr,....Jif-i 
YMu/aW = diag{[ .Ba(i,0) ... Ba(<,2ii - 1) j''} , i = Jlf JV, .... M - 1 

cyclically shift eachrow « of buffer Ba over 2L sample 

^W = 7^[0 0 B,.o(l,0) ... B2.o(l.JD-l)]^ 

cyclically shift B2.0 over L seniles 

• Update formula: 

1. E[fc| = Fk'- (d - kP-» I^>_i,3C,WW3[fcl) 

2. A(*i = 2g:.diag{Po-Mfcl. .... -PalLxlfc)} 

Pn.[k] = irP„tA: - 1) + (1 - ^) (EJ17,'_^ |Xj.„|») 

3. Wi[fc + 1] = W,(fc) + BWP-> A[A:]X* [fclE* [k\, i^M-N, .... M - 1 



highfrequenctes. bi that case, only a few frequency componenta qfYi should be stored in the speech + noise 
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4^ Improvement of stochastic gradient algorithm 

To acUeve a reliable estimate (80) of the average coirelatian nuittix S{y'y'-»} inlrigUy time-varying" 
noise scenarios (e.g. multi-talker babble), K should be much larger than LN. Hence, the averaging in 
the block-based" or ftequency-domam implementation proposed in Section 4.1, does not sufiSce to obtain 
8 good estimate for £{y»y-.^}. In this Section, w show that the performance pf the stochastic gradient 
algorithm is inqjroved by applying a low pass filter to the part of the gradient estimate that takes speech 
distortion into account, i.e., the term r(fe) in (65). The low pass filtering avoids a highly time-varying 
distortian of the desired speech component while not degradmg the tracking performance needed in non- 
stationary noise scenarios. 

4:2.1 Concept 

» 

Define w^as'^ 

w, = w-w„ (83) 

w. 6 Range{5{yV*}} (84) 
Hy'r'^yWn = 0. (85) 

Then, file desired speech compraent [fc] at the output equals 

=» »gt* - A] - ^y-tfc]. (86) 

Assume that varies slowly m time. This is desired since a fiuit changing w. results m a higyy time- 
varying distortion of the desired speech, and may thus harm sound quality. In addition, in hearing aid 
applications the average correlation matrix f{y»y-.*} is slowly time-varying as microphone characteristics, 
room acoustics and the average deshed speaker position do not change quickly in time. Fast changes in the 
noise scenario can be tracked by Ifae filter w„. This wiU be iUostrated in Section 43. 
Then, 

£iy'r'"}-^[k]=S{y'r'''}^s _ (87) 

can be approximated by 



(88) 



tl^^^%'^^^°^'^X'T^J^^y': r."'^" '"^"^"y Of «to noise, s. 

^ A large ^ » iTV in block-LMS would result in a too slow convergence rate. 
«r .I^^Lltrf- ^^f^;^ *e speech leakage y. in the noise references does not cover the whole fiaqaenqr speotnim 
T K ^1° ^ ^'^^^ ^ direct-to-icverberant B«io of the desired^Ss 
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where y is a vector during noise only. Udng Oie independence assnmptum [40] 

f {y»y»-^[fe]w„(fe]} » S{yV^[k\}S{^rm} (89) 
and fiiy-'y"."} = £{yj;./^y£;^J. we find that 

^{(y6»Ay£/, - yy*) ^a} = ^{(yfcttAyg./, - yy^) w[ife]}. (90) 

Replacing Oe e^ectation value by time averagmg, £{y V'*}w[Al can be estimated as 

J i=fc 

S (yfc»/iy£AW-yy*Pl)wp] (91) 



dnring noise only'*. The vahie K determines Ihe conveigence rate of the filter w,. 

Eemarh Inorder to obtain a good estimate o/£{y'y-.«}, the long-term averaged noise correlation 
matrices ^ EiXjc W ond ^ Efc*-jr y£.AytaA W should not differ too much from each other. TMs 
does not requires that the second order siatisties of the noise soune are stationary for about K time samples. 
It suffices that they are short-term stationary so that they can be estimated during noise onfypertads. 

The averagmg operation (91) is performed by applying the following low pass filter to &e term r[Jb] = 
? (y>u/i>^/, - yy^) w(fc] in (65): 



rW = Xr[fc-l) + (1 - A)i {y^f,yg,^ ~yy«) ^[k], 



(92) 



where A < 1. This corresponds to an averagmg window K of about -rK- san^les. The normalized stqp size 
p is modified into 

- 



W*l = ^ra,^[*-i] + (i-A)^K/.y6«/i-y^yi 



(94) 



Compared to (65), (92) requires 3NL - 1 additional MAC and extra storage otaNLxl vector r. 
4.2^ Frequency-domain 

Equation (92) can be extended to the fiequencyMlomain. The update equation for Wi[A: + 1] in algorithm 1 
dien becomes: 



« "^J^ ^ "^"^y yW sholdbe leplaeed by y»„,,[*l and fhe speech + noise vector 

yte/i 1*1 by y(M when ad^ptiDg during periods ofspeech+ noise. " «i i i« nrawyeBKir 
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Wi[k+.l] = W<[fcl + FgP-lA(fc](Y?[fc]E*[&]-R,(fc]); 

Rilk] = AR<[A:-l] + (l-A)i(Yi[Jb)E5[Jb]-Y?(fclE;[fcl) 



(95) 



with 



y5-kP-i 5^ Y7[fc]WJ[&l ; (96) 
i=Ar-jv / 



(97) 
(98) 



and p[fc} computed as follows: 



p[k] = ^diag {Po-Mft], .... P^UW} 
Pm(*] = Pl.m(fcl + P2.mlfe] 

Af-1 . 
J-Af-W 

P2.m[fc] = AP2.m(A;-l] + (l-A)i 



Compared to algoritfam 1 , (95).(98) requires one extra 2L-point FFT and SNL - 2iV - 2L extra MAC per 
L samples and additional mempiy storage of a 2NL x 1 real data vector. To obtain flie same time constant 
in tiie averaging operation as in the time-domain version with IT = 1, A should equal V'. 

]&tperimental resulte in Section 4.3 wiU show that the performance of the stochastic gmdient algorithm 
significantly in:q>rdves by Oie low pass filter, especially for large A. 

4.2.3 Complexity of different stochastic gradient algorithms 

Table 1 summarizes the computational complexity (expressed as the number of real muitiply-accumulate" 
(MAC), divisions (D), square roots (Sq) and absolute values (Abs)) of the time-domain (TD) and frequency, 
domain (FD) Stochastic Gradient (SG) and NLMS based algorithms. Comparison is made with standard 
NLMS and the NLMS based SPA. We assume ttiat one complex multiplication is equivalent to 4 real mul- 
tiplications and 2 real additions. A 2L-po int FFT of a real input vector requires 2Llog2 2L real MAC 
"coimted as the number of multiply-accumulale, additions and multiplications. 



(assuming iadix-2 FFT algoriduns). 

Table 1 indicates that the TD-SG without filter wq and the SPA are about twice as complex as the 
standard ANC. When applying a Low Pass filter (LP) to the regularization term, the TD-SG algorithm 
has about three times the conq>Iexity of the ANC. The mciease in complerity of the frequency-domain 
in^lementalions is less. 



Tible 1: Computational complexity of TD and FD-NLMS and stochastic gradient algorithms (expressed 
number of real MAC, divisions (D), absolute vahies (Abs) and square roots (Sq) per sanqile) 





Algorithm 


update formula 


adaptatloii of step size 


TJ) 


KLMSANC 


(2iWf-2)Z,+ l)MAC 


lD+(M-l)IrMAC 




>aMS based SPA 


(4{Af - 1) + 1) MAC+1 D+1 Sq 


lI>l-(Af-l)XrMAC 




SO 


(4i>/I,+ 6)MAC 


ID + 1 Abs+(2i\rL + 2) MAC 




>nLMS based algorithm 


(4iV£r + 3)MAC 


ID+ATLMAC 




SG with LP 


(7iVL + 4)MAC 


1 rn-l iib^{2NL + 4) MAC 


FD 


mMSANC 


(10 Af - 7 - ) + (6M - 2) logj 2L MAC 


1D+(2M + 2)MAC 




NLMS based SPA 


{14Af - 11 « iiii^ + (6M 2) loga 2L MAC 
+l/LSq+l/i;D 


1D+(2M + 2)MAC 




SG 


(18JV + 6 - ^) + (6i\r + 8)loga 2LMAC 


IIHI abs + (4Ar + 4) MAC 




M,MS based algoriihm 


(16JV + 4 - SgC) ^. (6is^ + e)log3 2iiMAC 


1D+(2JV+2)MAC 




SG with LP 


(26Ar + 4 ^ i^) + (6JNr + 10) log, 2iMAC 


1 EH-1 Abs+(4Ar + 6) MAC 



Remark In Ttible 1 and Figure 8, the complexity of time-domain andfrequency-dmnain NLMSANCand 
NLMS based SPA represents the complexify when the adcgrtive filter is onfy tqtdated during noise only. ^ 
the adaptive filter is also updated during speech + noise using datafiom a noise buffer, the time-domain im- 
plementations require NL additional MAC per sample and thejrequency-domtdn implementations require 
2 additUmdl FFT and (4L(Af - 1) - 2(M - 1) H- i) MAC per L samples. 

As an illustration. Figure 8 plots the complexity (expressed as the number of Mega operations per second 
0»4ops)) of the time-domain and fiequency-domain stochastic gradient algorithm with LP fiatering as a 
fimctton of £ for M = 3 and a sanq»ling frequency /, = 16 kHz. Comparison is made with the NLMS- 
basfid ANC of the OSC and the SPA. The complexily of the FD SPA is not depicted, since for small M. 
it is comparable to ttie cost of the FD-NLMS ANC. For L > 8, the frequemgr-domain implementations' 
result in a significantly lower complexity compared to their time-domain equivalents. The computational 
cost of die FD stochastic gradient algorithm with LP is limited, making it a good alternative to the SPA fw- 
implementation in hearing aids. 

4.3 Experimental results 

In this Section, we evaluate tiie perfbrmance of the dfflferent FD stochastic gradient algorifluns based on 
ejqperimental results for a hearing aid application. Comparison is made witfi tiie FD-NLMS based SPA. For 
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a fair comparison, the HJ-NLMS based SPA is -Mke «« stoAastic gradient algorithms, also adairted during 
speech + noise using data from a noise buffet, 

43 J Set-np 

A tbree-microphone Behind-The-Ear (BTE) hearing aid wifli three omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an ofOce room. Hie interspacing dbetween the first and 
the second microphone is about d = 1 cm and the interspacing between Ihe second aiui third mioophone 
about 1.5 cm. ThereverijemtiontimersodBisabout 700ms foraspeechweightednoise. The desired speech 
signalandthenoisesigaalsareuncondated. The desired speech source consists of sentences spoken by a 
male speaker. Both the speech and the noise signal have a level of 70 dB SPL at the center of the head. The 
desired speech source and noise sources are positioned at a distance of 1 meter fiom the head: the speech 
source in front of the head, the noise sources at an angle 6 w.r.t the speech source. For evaluation pmposes. 
the speech and noise signal have been recorded separately. 

The microphone signals are pre-whitened prior to ptocessmg to improve inteUigiWlity [371 and flie 
output is accordingly de-whitened. Jh the experiments, the microphones have been cabWd by Lieans of 
recordmgs of an anechoic speech weighted noise signal positioned at 0° measured while the microphone 
array wasmonnted on the head. Adelay-and-smnbeamformeris used asafixedbeamfomie^ since -in case 
of smallmicrophone interspacing, itisrobustto model cnors. The blocking matrix B pairwise subtracts 
the tune aligned calibrated microphone signals. 

The performance of (he FD stochastic gradient algorithms is evaluated for a filter length L = 32 taps per 
chamiel, // = 0.8 and -y = 0. To exchide the effect of the spatial pre-processoi; the perfomiance measures 
are calculated w.tt the output of the fixed beamfiamer. lis sensitivity of the algorithms agafasterrcas in 
the assumed signal model is iUustrated for microphone mismatch, e.g.. a gain mismatch = 4dB of the 
second microphone. Among the different possible sigmd model errors, especially microphone mismatch 

was fi«n«l to be harmfid to the perfinroance of fte GSC m a hearing aid appUcation [m 

i^crophones are rarely matched ingainand phase. Inp],gainandphase differences between iriicrophon J 

characteristics of up to 6 dB and 10% respectively, have been reported. 

43.2 Comparison of different ED stochastic gradient techniques 

Figure 9(a) and (b) conq>are the performance of the different FD Stochastic Gradient (SG) SP^SDW-MWF 
algorithms without wo (i.e., the SDR^GSq as a fimction of the trade-off parameter m fi>r a stationanr and 
non-stationary (e.g.. multi-talker babble) noise source, respectively, at 90». To analyze the impact of the 
approxmiation (64) on the performance, the result of a FD implementation of (63). which uses the clean 
?^x!fJ^?''*'**°*'" Smdient ^^^^ 

the NLMS based algorithm, especially fi,rl 1. Without Low Pass (LP) filter, both algorithms achieve 
a w««e improvement compared to (63), especially for large ^. For a stationary speecb-like noise source 
the FD-SG algorithm does not suffer too much from approximation (64). In a highly time-varying noise 



scenario, such as multi-talker babble, the limited averaging of r[k] in fte FD implementation does not 
suffice to maintain the large noise reduction achieved by (63). TTie loss in noise reduction performance 
could be reduced by decreasing the step-size at the expense of a reduced convergence speed. Applying 
fte low pass filter (95) signifibantiy improves the peifi»mance for aU A whUe changes in the noise scenario 
can still be tracked. 

Figure 10 plots the improvement AS>aii„teUig and SD^dBe of the SP-SDW-MWF (i = 0.5) with and 
without filter wo for the babble noise scenario as a fiinction of ^ where A is the exponential weighting 
fiictor of flie U» filter (see (95)). Perfinmance clearly improves for increasing A. For small A, the SP-' 
SDW-MWF witii wo suffers &om a larger excess enor -and hence worse ASNRtoteiHg- compaied to the 
SP-SDW-MWF without wq. This is due to tiie larger dimensions of 5{y"y»'*}. 

The LP filter avoids that the desired speech is distorted by a highly time-vaiying filter w,. In contrast 
to a decrease in step size //, the LP filter does not compromise ttaddng of changes in the noise scenario. 
AsaniUustiation, Figure 11 plots the convergence behavior of tiie FD stochastic gradient algoritiim without 
Wo (i.e.. tiie SDR-GSC) for A = 0 and A = 0.9998. respectively, when the noise source position suddenly 
changes from 90- to 180». A gam mismatch T2of4dB was applied to the second microphone. Toavoidfest 
fluctuations m tiie residual noise energy and speech distortion energy ^, the desired and interfering noise 
source m this experiment are stationary, speedt-Hke. The upper figure depicts the residual noise energ^ 4 
as a fiinction of tiie number of input samples, die lower figure plots the residual speech distortion eg during 
speech + noise periods as a fimction of tiie number of speech + noise sanq)les. Bofli algoriflnm (i.e., A = 0 
and A = 0.9998) have about fbs same convergence rale. When tiie change in position occurs, tiie algbriOnn 
witii A = 0.9998 even converges &stei; For A = 0, tiie approximation error (64) remains large for a whUe 
smce the noise vectors in tiie buffer are not up to date. For A = 0.9998. tiie unpact of tiie instantaneous large 
approximation error is reduced thanks to the low pass filter. 

433 Comparison with SPA 

Figure 12 and Figure 13 compare tiie performance of tiie FD stochastic gradient algoritiun witii LP filter 
(A = 0.9998) andtiie FD-NLMS based SPA m a multiple noise source scenario. The noise sdenario consists 
of 5 multi-talker babble noise sources poationed at angles 75°, 120°, 180°, 240°, 285° w.r.t. tiie desired 
source at 0°. To assess tiie sensitivity of tiie algoritiuns agamst errors m flie assumed signal model, the 
mflnence of microphone mismatch, e.g., gam mismatch Ta = 4 dB of tiie second microphone, on tiie 
performance is depicted too. In Figure 12. flie improvement ASNRi„temg and the distortion SDj^uig of tiie 
SP-SDW-MWF witii and without filter wo is depicted as a fimction of tiie trade off fiwtor i. Figure 13 
shows the results of the QIC-GSC 

w^w < 0^ (99) 
for different constraint vahies 0^, whidi is hnplonented usmg tiie FD-kLMS based SPA. 

Botti, tiie SPA and tiie stochastic gradient based SP-SDW-MWF increase tiie robusbiess of tiie GSC 
(i.e., the SP-SDW-MWF wifliout wq and i = 0). For a given maxhnum allowable distortion ST>^, 
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the SP-SDW-MWF with and without wq achieve a better noise reduction perfonnance than the SPA. Hie 
peifonnance of the SP-SDW-MWF with Wq is -in contrast to the SP-SDW-MWF without wp- not affected 
by mioophone mismatch. In the absence of model errors, the SP-SDW-MWF with wq achieves a slightly 
worse performance than the SP-SDW-MWF wifliout wq. With wq, the estimate of {y«y».*} is less 
accurate due to the larger dimensions of ^^{y'y'''*} (see also Figure 10). " 

In short, the proposed stodiastic gradient implementation of flie SP-SDW-MWF preserves the benefit of 
fte SP-SDW-MWF over Hie QIC-GSC. 

4.4. Conclusions 

In this paper, we derived time-domain and fiequency-domain stochastic gradient algorithms fiir the SP- 
SDW-MWF and compared their performance to the SPA. Starting ftom the cost function of the SP-SDW- 
MWF, a time-domain stodiastic gradient algorithm has been derived in Section 4.1, In addition, the LMS 
based algorithm P6] has been extended so that it applies to the SP-SDW-MWF. To increase convergence 
and reduce complexity, a ftequency-domain unplementation has been proposed. Both, the stochastic gra- 
dient and LMS based algorithm suffer from a kirge excess error when applied in highly time-varying noise 
scenarios. In Section 4.2, we show tiiat the excess error is reduced by applying a low pass filter to die part of 
the gradient estimate that Imiits speech distortion, nie low pass filtering avoids a highly time-varying distor- 
ti<m of the desired speech camponent while not degradh^ the tiaddng peifomiance needed m timc-varymg 
noise scenarios. Section 4.3 compares the performance of the different frequency-domain stochastic gradi- 
ent algorithms for a hearing aid appKcation. The stochastic gradient SP-SDW-MWF ou^erfomis the LMS 
based algorithm, while complesdty is not increased. Fora non-stationary noise scenario, the LMS based and 
stochastic gradient SP-SDW-MWF suffer from a reasonably large excess eiror. Experimental results show 
that the low pass filtering significantly hnproves the perfonnance of the stochastic gradient algorithm and 
does not compromise the trackmg of changes in the noise scenario. In addition, experiments demonstrate 
that die proposed stochastic gradient algoritiunj^Bserwestiie benefit of flie SP-SDW-MWF over QIC-GSC. 
The limited computational cost and the better noise reduction perfrnmance of tiie proposed algorithm make 
it a good alternative to die SPA for mqtlementatimi m hearing aids. 
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